may be of interest. That is, the average difference between the estimator and the truth. Estimators with Bias(ˆθ) = 0 are called unbiased.

Similar documents
Definition 9.1 A point estimate is any function T (X 1,..., X n ) of a random sample. We often write an estimator of the parameter θ as ˆθ.

Point Estimation. Edwin Leuven

4.1 Introduction Estimating a population mean The problem with estimating a population mean with a sample mean: an example...

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

Chapter 7 - Lecture 1 General concepts and criteria

Point Estimators. STATISTICS Lecture no. 10. Department of Econometrics FEM UO Brno office 69a, tel

Chapter 8. Introduction to Statistical Inference

Computer Statistics with R

Review of key points about estimators

Applied Statistics I

1. Covariance between two variables X and Y is denoted by Cov(X, Y) and defined by. Cov(X, Y ) = E(X E(X))(Y E(Y ))

Statistics and Probability

Statistical analysis and bootstrapping

Chapter 5. Statistical inference for Parametric Models

1. Statistical problems - a) Distribution is known. b) Distribution is unknown.

Review of key points about estimators

Chapter 8: Sampling distributions of estimators Sections

The Two-Sample Independent Sample t Test

Week 2 Quantitative Analysis of Financial Markets Hypothesis Testing and Confidence Intervals

Chapter 7: Point Estimation and Sampling Distributions

Exam 2 Spring 2015 Statistics for Applications 4/9/2015

continuous rv Note for a legitimate pdf, we have f (x) 0 and f (x)dx = 1. For a continuous rv, P(X = c) = c f (x)dx = 0, hence

Statistical estimation

Section 2.4. Properties of point estimators 135

Lecture 10: Point Estimation

Point Estimation. Principle of Unbiased Estimation. When choosing among several different estimators of θ, select one that is unbiased.

Normal Distribution. Notes. Normal Distribution. Standard Normal. Sums of Normal Random Variables. Normal. approximation of Binomial.

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

Chapter 8: Sampling distributions of estimators Sections

Sampling Distribution

Generating Random Numbers

Statistics 431 Spring 2007 P. Shaman. Preliminaries

Back to estimators...

Qualifying Exam Solutions: Theoretical Statistics

Much of what appears here comes from ideas presented in the book:

MATH 3200 Exam 3 Dr. Syring

IEOR E4703: Monte-Carlo Simulation

BIO5312 Biostatistics Lecture 5: Estimations

Estimating the Greeks

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing Examples

STA258 Analysis of Variance

8.1 Estimation of the Mean and Proportion

The Constant Expected Return Model

Modelling Returns: the CER and the CAPM

Confidence Intervals Introduction

Lecture Notes 6. Assume F belongs to a family of distributions, (e.g. F is Normal), indexed by some parameter θ.

Chapter 5: Statistical Inference (in General)

μ: ESTIMATES, CONFIDENCE INTERVALS, AND TESTS Business Statistics

Chapter 7. Inferences about Population Variances

Simulation Wrap-up, Statistics COS 323

The Normal Distribution

The Bernoulli distribution

MTH6154 Financial Mathematics I Stochastic Interest Rates

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

Chapter 7: SAMPLING DISTRIBUTIONS & POINT ESTIMATION OF PARAMETERS

Sample Size for Assessing Agreement between Two Methods of Measurement by Bland Altman Method

What was in the last lecture?

Point Estimation. Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage

NEWCASTLE UNIVERSITY SCHOOL OF MATHEMATICS, STATISTICS & PHYSICS SEMESTER 1 SPECIMEN 2 MAS3904. Stochastic Financial Modelling. Time allowed: 2 hours

A comment on Christoffersen, Jacobs and Ornthanalai (2012), Dynamic jump intensities and risk premiums: Evidence from S&P500 returns and options

Chapter 14 : Statistical Inference 1. Note : Here the 4-th and 5-th editions of the text have different chapters, but the material is the same.

Estimating parameters 5.3 Confidence Intervals 5.4 Sample Variance

Chapter 4: Asymptotic Properties of MLE (Part 3)

Standard Normal, Inverse Normal and Sampling Distributions

On Some Test Statistics for Testing the Population Skewness and Kurtosis: An Empirical Study

Statistics for Business and Economics

Improved Inference for Signal Discovery Under Exceptionally Low False Positive Error Rates

Lecture 22. Survey Sampling: an Overview

Statistics for Business and Economics

Unit 5: Sampling Distributions of Statistics

Probability. An intro for calculus students P= Figure 1: A normal integral

Econ 300: Quantitative Methods in Economics. 11th Class 10/19/09

Unit 5: Sampling Distributions of Statistics

Section 2: Estimation, Confidence Intervals and Testing Hypothesis

Learning From Data: MLE. Maximum Likelihood Estimators

Extend the ideas of Kan and Zhou paper on Optimal Portfolio Construction under parameter uncertainty

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions.

MVE051/MSG Lecture 7

Statistical Methodology. A note on a two-sample T test with one variance unknown

Tutorial 11: Limit Theorems. Baoxiang Wang & Yihan Zhang bxwang, April 10, 2017

Posterior Inference. , where should we start? Consider the following computational procedure: 1. draw samples. 2. convert. 3. compute properties

3 ˆθ B = X 1 + X 2 + X 3. 7 a) Find the Bias, Variance and MSE of each estimator. Which estimator is the best according

Estimation after Model Selection

4-1. Chapter 4. Commonly Used Distributions by The McGraw-Hill Companies, Inc. All rights reserved.

1 Introduction 1. 3 Confidence interval for proportion p 6

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

CPSC 540: Machine Learning

Chapter 7: Estimation Sections

Strategies for Improving the Efficiency of Monte-Carlo Methods

Populations and Samples Bios 662

CPSC 540: Machine Learning

IEOR E4703: Monte-Carlo Simulation

Contents. 1 Introduction. Math 321 Chapter 5 Confidence Intervals. 1 Introduction 1

The Vasicek Distribution

CSE 312 Winter Learning From Data: Maximum Likelihood Estimators (MLE)

Resampling Methods. Exercises.

ELEMENTS OF MONTE CARLO SIMULATION

Section 2: Estimation, Confidence Intervals and Testing Hypothesis

Simple Random Sampling. Sampling Distribution

John Hull, Risk Management and Financial Institutions, 4th Edition

Transcription:

1 Evaluating estimators Suppose you observe data X 1,..., X n that are iid observations with distribution F θ indexed by some parameter θ. When trying to estimate θ, one may be interested in determining the properties of some estimator ˆθ of θ. In particular, the bias ) Bias(ˆθ) = E (ˆθ θ may be of interest. That is, the average difference between the estimator and the truth. Estimators with Bias(ˆθ) = 0 are called unbiased. Another (possibly more important) property of an estimator is how close it tends to be to the truth on average. The most common choice for evaluating estimator precision is the mean squared error, ) MSE(ˆθ) = E ((ˆθ θ). When comparing a number of estimators, MSE is commonly used as a measure of quality. By directly using the identity that var(y ) = E(Y ) E(Y ), where the random variable Y = ˆθ θ, the above equation becomes MSE(ˆθ) = E (ˆθ θ ) + var(ˆθ θ) = Bias(ˆθ) + var(ˆθ) where the last line follows from the definition of bias and the fact that var(ˆθ θ) = var(ˆθ), since θ is a constant. For example, if X 1,..., X n are iid N(µ, σ ), then X N(µ, σ /n). So the bias of X as an estimator of µ is and the MSE is Bias(X) = E(X µ) = µ µ = 0 MSE(X) = 0 + var(x) = σ /n The above identity says that the precision of an estimator is a combination of the bias of that estimator and the variance. Therefore it is possible for a biased estimator to be more precise than an unbiased estimator if it is significantly less variable. This is known as the bias-variance tradeoff. We will see an example of this. 1. Using monte carlo to explore properties of estimators In some cases it can be difficult to explicitly calculate the MSE for an estimator. When this happens monte carlo can be a useful alternative to a very cumbersome mathematical calculation. The example below is an instance of this. Example: Suppose X 1,..., X n are iid N(θ, θ ) and we are interested in estimation of θ. Two reasonable estimators of θ are the sample mean ˆθ 1 = 1 n n i=1 X i and the sample

standard deviation ˆθ 1 = n n 1 i=1 (X i X). To compare these two estimators by monte carlo for a specific n and θ: 1. Generate X 1,..., X n N(θ, θ ). Calculate ˆθ 1 and ˆθ 3. Save (ˆθ 1 θ) and (ˆθ θ) 4. Repeat step 1-3 k times 5. Then the means of the (ˆθ 1 θ) s and (ˆθ θ) s, over the k replicates, are the monte carlo estimators of the MSEs of ˆθ 1 and ˆθ. This basic approach can be used any time you are comparing estimators by monte carlo. The larger we choose k to be, the more accurate these estimates are. We implement this in R with the following code for θ =.5,.6,.7,..., 10, n = 50, and k = 1000. k = 1000 n = 50 # Sequence of values of theta THETA <- seq(.5, 10, by=.1) # Storage for the MSEs of each estimator MSE <- matrix(0, length(theta), ) # Loop through the values in Theta for(j in 1:length(THETA)) { } # Generate the k datasets of size n D <- matrix(rnorm(k*n, mean=theta[j], sd=theta[j]), k, n) # Calculate theta_hat1 (sample mean) for each data set ThetaHat_1 <- apply(d, 1, mean) # Calculate theta_hat (sample sd) for each data set ThetaHat_ <- apply(d, 1, sd) # Save the MSEs MSE[j,1] <- mean( (ThetaHat_1 - THETA[j])^ ) MSE[j,] <- mean( (ThetaHat_ - THETA[j])^ ) # Plot the results on the same axes plot(theta, MSE[,1], xlab=quote(theta), ylab="mse", main=expression(paste("mse for each value of ", theta)), type="l", col=, cex.lab=1.3, cex.main=1.5)

MSE for each value of θ MSE 0.0 0.5 1.0 1.5.0 4 6 8 10 θ Figure 1: Simulated values for the MSE of ˆθ 1 and ˆθ lines(theta, MSE[,], col=4) From the plot we can see that ˆθ, the sample standard deviation, is a uniformly better estimator of θ than ˆθ 1, the sample mean. We can verify this simulation mathematically. Clearly the sample mean s MSE is MSE(ˆθ 1 ) = θ /n The MSE for sample standard deviation is somewhat more difficult. It is well known that, in general, the sample variance from a normal population, V, is distributed so that (n 1)V σ χ n 1, where σ is the true variance. In this case ˆθ = V. The χ distribution with k degrees of freedom has density function p(x) = (1/)k/ Γ(k/) xk/ 1 e x/ where Γ is the gamma function. Using this we can derive the expected value of V :

( ) σ E V = n 1 E σ = n 1 ( ) (n 1)V 0 σ (1/) n 1 x x((n 1)/) 1 e x/ dx which follows from the definition of expectation and the expression above for the χ density. The trick now is to rearrange terms and factor out constants properly so that the integrand become another χ density ( ) σ (1/) n 1 E V = n 1 0 x(n/) 1 e x/ dx σ = n 1 Γ(n/) (1/) n 1 0 Γ(n/) x(n/) 1 e x/ dx σ = n 1 Γ(n/) (1/) n 1 (1/) n/ (1/) n/ 0 Γ(n/) x(n/) 1 e x/ dx }{{} χ n density Now we know that the integral in the last line is 1, since it has the form of a χ density with n degrees of freedom. The rest is just simplifying constants: ( ) σ E V = n 1 Γ(n/) (1/) n 1 (1/) n/ σ = n 1 Γ(n/) Γ(n/) = n 1 Γ( n 1 )σ = n 1 Γ(n/) σ Therefore E(ˆθ ) = Γ(n/) n 1 Γ( n 1 )θ. So the bias is ( ) Bias(ˆθ ) = θ E(ˆθ ) = θ 1 n 1 Γ(n/) To calculate the variance of ˆθ we also need E(ˆθ ). ˆθ is the sample variance, which we know is an unbiased estimator of the variance, θ, so E(ˆθ ) = θ

so the variance of ˆθ is Finally, ( var(ˆθ ) = θ 1 ) n 1 Γ(n/) Γ( n 1 ) ( MSE(ˆθ ) = θ 1 n 1 Γ(n/) Γ( n 1 ) ) = θ (1 n 1 Γ(n/) ) + ( 1 ) n 1 Γ(n/) Γ( n 1 ) It is a fact that ( ) 1 n 1 Γ(n/) < 1/n for any n. This implies that MSE(ˆθ ) < MSE(ˆθ 1 ) for any n and any θ. We can check this derivation by plotting the MSEs and comparing with the simulation based MSEs: # for each Q[1] is Theta, and Q[] is n # MSE of theta_hat1 MSE1 <- function(q) (Q[1]^)/Q[] # MSE theta_hat MSE <- function(q) { theta <- Q[1]; n <- Q[]; G <- gamma(n/)/gamma( (n-1)/ ) bias <- theta * (1 - sqrt(/(n-1)) * G ) variance <- (theta^) * (1 - (/(n-1)) * G^ ) } return(bias^ + variance) # Grid of values for Theta for n=50 THETA <- cbind(matrix( seq(.5, 10, length=100), 100, 1 ), rep(50,100)) # Storage for MSE of thetahat1 (column 1) and thetahat (column ) MSE <- matrix(0, 100, ) # MSE of theta_hat1 for each theta MSE[,1] <- apply(theta, 1, MSE1)

MSE for each value of θ MSE 0.0 0.5 1.0 1.5.0 4 6 8 10 θ Figure : True values for the MSE of ˆθ 1 and ˆθ # MSE of theta_hat for each theta MSE[,] <- apply(theta, 1, MSE) plot(theta[,1], MSE[,1], xlab=quote(theta), ylab="mse", main=expression(paste("mse for each value of ", theta)), type="l", col=, cex.lab=1.3, cex.main=1.5) lines(theta[,1], MSE[,], col=4) Clearly the conclusion is the same as the simulated case ˆθ has a lower MSE than ˆθ 1 for any value of θ, but it was far less complicated to show this by simulation. Exercise 1: Consider data X 1,..., X n iid N(µ, σ ) where we are interested in estimating σ and µ is unknown. Two possible estimators are: ˆθ 1 = 1 n n (X i X) i=1 and the conventional unbiased sample variance: ˆθ = 1 n 1 n (X i X) i=1 Estimate the MSE for each of these estimators when n = 15 for σ =.5,.6,..., 3 and evaluate which estimate is closer to the truth on average for each value of σ.

Properties of hypothesis tests Consider deciding between two competing statistical hypotheses H 0, the null hypothesis, and H 1, the alternative hypothesis based on data X 1,..., X n. A test statistic is a function of the data T = T (X 1,..., X n ) such that if T R α then you reject H 0, otherwise you do not. The space R α is called the rejection region and is chosen so that P (reject H 0 H 0 is true) = P (T R α H 0 is true) = α α is referred to as the level of the test, and is the probability of incorrectly rejecting H 0 ; α is typically chosen by the user;.05 is a common choice. For example, in a two-sided z-test of H 0 : µ = 0, when σ is known, the rejection region is R α = (, z α/ ) (z 1 α/, ) where z a denotes the a th quantile of a standard normal distribution. When α =.05 this yields the familiar rejection region (, 1.96) (1.96, ). A good hypothesis test is one that has for a small value of α, has a large Power, which is the probability of rejecting H 0 when H 0 is indeed false. When testing the hypothesis H 0 : θ = θ 0 for some specific null value θ 0, and θ true θ 0, the power is Power(θ true ) = P (T R α θ = θ true ) Some primary determinants of the power of a test are: The sample size The difference between the null value and the true value (generally referred to as effect size The variance in the observed data In many settings practioners are interested in either a) how far the true value of θ must be from θ 0 or b) for a fixed effect size, how large the sample size must be for the power to reach some nominal level, say 80%. Inquiries of this type are referred to as power analysis Example : Power of the two-sample z-test Suppose you observe X 1,..., X n iid N(µ X, σ ) and Y 1,..., Y m iid N(µ Y, σ ) where µ X, µ Y are unknown, and σ is known. We are interested in a two-sided test of the hypothesis H 0 : µ X µ Y = 0. A common statistic for testing such hypotheses is the z-statistic: ( ) n X Y T = σ It is well known that, under H 0, T has a standard normal distribution. It can be shown that, for any value of µ D = µ X µ Y, this test is the most powerful level α-level test of H 0. (Similarly, when the variances are unknown and the sample size/variances are potentially unequal, the students t-test is the most powerful α level tests of this null

hypothesis). µ D is the measure of effect size in this test, and Power(µ D ) is a monotonically increasing function. For example, if µ D is very small it intuitively that we would be less likely to reject H 0 than if µ D was large. We will investigate the power of the two-sample z-test for sample sizes n = 10, 0, 30, 40, 50 as a function of the true mean difference µ D. The larger the true σ the smaller the power will be (for a fixed n and µ D ), but we will not investigate this effect in this example. Each data set will be generated to have σ = 1, the two samples will have equal sizes, and α =.05. The basic algorithm is: 1. Generate datasets of the form X 1,..., X n N(0, 1), and Y 1,..., Y n N(µ D, 1).. Calculate T 3. Save I = I( T > z 1 α/ ) 4. Repeat k times 5. The mean of the k values of the I s is the monte carlo estimate of Power(µ D ). #alpha level alpha =.05 # number of simulation reps k <- 1000 # sample sizes n <- 10*c(1:5) # the mu_d s mu_d <- seq(0,, by=.1) # storage for the estimated Powers Power <- matrix(0, length(mu_d), 5) for(i in 1:5) { for(j in 1:length(mu_D)) { # Generate k datasets of size n[i] X <- matrix( rnorm(n[i]*k), k, n[i]) Y <- matrix( rnorm(n[i]*k, mean=mu_d[j]), k, n[i]) # Get sample means for each of the k datasets Xmeans <- apply(x, 1, mean) Ymeans <- apply(y, 1, mean)

# Calculate the Z statistics T <- sqrt(n[i])*(xmeans - Ymeans)/sqrt() # Indicators of the z-statistics being # in the rejectin region I <- (abs(t) > qnorm(1-(alpha/))) # Save the estimated power Power[j,i] <- mean(i) } } plot(mu_d, Power[,1], xlab=quote(mu(d)), ylab=expression( paste("power(", mu(d), ")")), col=, cex.lab=1.3, cex.main=1.5, main=expression(paste("power(", mu(d), ") vs.", mu(d))), type="l" ) points(mu_d, Power[,1], col=); points(mu_d, Power[,], col=3) points(mu_d, Power[,3], col=4); points(mu_d, Power[,4], col=5) points(mu_d, Power[,5], col=6); lines(mu_d, Power[,], col=3) lines(mu_d, Power[,3], col=4); lines(mu_d, Power[,4], col=5) lines(mu_d, Power[,5], col=6); abline(h=alpha) legend(1.5,.3, c("n = 10", "n = 0", "n = 30", "n = 40", "n = 50"), pch=(1), col=c(:6), lty=1) It is actually straightforward to calculate the power of the two-sample z-test. If µ D is the true mean difference, then T is a standard normal random variable but shifted over by nµd /, since E(X Y ) = µ D. Letting Z denote a standard normal random variable, the power as a function of µ D is: Power(µ D ) = P ( T > z 1 α/ ) = 1 P ( ) z α/ T z 1 α/ = 1 P (z α/ Z + nµ D / ) z 1 α/ ( = 1 P z α/ nµ D / Z z 1 α/ nµ D / ) ( ( = 1 P Z z 1 α/ nµ D / ) ( P Z z α/ nµ D / )) ( = 1 Φ(z 1 α/ nµ D / ) Φ(z α/ nµ D / ) ) where Φ denotes the standard normal CDF. Notice that as n, lim n Φ(z 1 α/ nµ D / ) = Φ( ) = 0

Power(µ(D)) vs.µ(d) Power(µ(D)) 0. 0.4 0.6 0.8 1.0 n = 10 n = 0 n = 30 n = 40 n = 50 0.0 0.5 1.0 1.5.0 µ(d) Figure 3: Simulated power of the two-sample z-test for sample sizes n = 10, 0, 30, 40, 50 and µ D ranging from 0 up to. and similarly for Φ(z α/ nµ D / ), therefore lim Power(µ D) = 1 n In other words, not matter how small µ D > 0 is, the power to detect it as significantly different from 0 goes to 1 as the sample size increases. To check this calculation we plot the theoretical power and compare it with the simulation: # alpha level alpha <-.05 # sample sizes n <- 10*c(1:5) # the mu_d s mu_d <- seq(0,, by=.1) # storage for the true Powers Power <- matrix(0, length(mu_d), 5) for(i in 1:5) { for(j in 1:length(mu_D)) {

Power(µ(D)) vs.µ(d) Power(µ(D)) 0. 0.4 0.6 0.8 1.0 n = 10 n = 0 n = 30 n = 40 n = 50 0.0 0.5 1.0 1.5.0 µ(d) Figure 4: Theoretical power of the two-sample z-test for sample sizes n = 10, 0, 30, 40, 50 and µ D ranging from 0 up to. } } Power[j,i] <- 1 - ( pnorm( qnorm(1-alpha/) - sqrt(n[i])*mu_d[j]/sqrt() ) - pnorm( qnorm(alpha/) - sqrt(n[i])*mu_d[j]/sqrt() ) ) # plot the results plot(mu_d, Power[,1], xlab=quote(mu(d)), ylab=expression( paste("power(", mu(d), ")")), col=, cex.lab=1.3, cex.main=1.5, main=expression(paste("power(", mu(d), ") vs.", mu(d))), type="l" ) points(mu_d, Power[,1], col=); points(mu_d, Power[,], col=3) points(mu_d, Power[,3], col=4); points(mu_d, Power[,4], col=5) points(mu_d, Power[,5], col=6); lines(mu_d, Power[,], col=3) lines(mu_d, Power[,3], col=4); lines(mu_d, Power[,4], col=5) lines(mu_d, Power[,5], col=6); abline(h=alpha) legend(1.5,.3, c("n = 10", "n = 0", "n = 30", "n = 40", "n = 50"), pch=(1), col=c(:6), lty=1) We can see the theoretical calculation matches the simulation. In this case the power calculation is simple, but for most hypothesis tests, power calculations are intractable, so simulation based power analysis is the only option.

Exercise : Using a similar approach to the above, consider the same problem except X 1,..., X n N(µ X, σx ) and Y 1,..., Y n N(µ Y, σy ) (both equal sample size) where σ X, σ Y are not known and but are assumed to be equal. Use the statistic ( ) n X Y T = ˆσ X + ˆσ Y where ˆσ X and ˆσ Y are the unbiased sample variances from exercise 1 calculated for each set of data. Under H 0, T has a t-distribution with n degrees of freedom. Estimate the power of this test for sample sizes n = 10, 0, 30, 40, 50 and for the true µ X µ Y ranging from 0 up to. In this case the theoretical power calculation, although possible, is significantly more difficult.